Managing Keyword Variation with Frequency Based Generation of Word Forms in IR

نویسنده

  • Kimmo Kettunen
چکیده

This paper presents a new management method for morphological variation of keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. The proposed method has been evaluated so far with four languages, Finnish, Swedish, German and Russian, which show varying degrees of morphological complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval?

In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most of the forms are rare. Corpus statistics showed that about 84 – 88 per cent of the occurrences of inflecte...

متن کامل

Experiments on Automatic Web Page Categorization for IR system

This paper describes keyword-based Web page categorization. Our goal is to embed our categorization technique into information retrieval (IR) systems to facilitate the end-users’ search task. In such systems, search results must be categorized faster, while keeping accuracy high. Our categorization system uses a knowledge base (KB) to assign categories to Web pages. The KB contains a set of cha...

متن کامل

Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval

This paper presents implementations of generative management method for morphological variation of query keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that either have fair amount of morphological variation or are morphologically very rich. The paper reports implementation an...

متن کامل

بهبود کارایی سیستم کاوشگر کلمات تلفنی با استفاده از نرمالیزاسیون امتیاز اطمینان مبتنی بر روش برنامه‌ریزی خطی

Conventional word spotting systems determine hypothesized keywords and their confidence score using a speech recognizer. Acceptance or rejection of these keywords is intended based on comparison of their scores with a specific threshold. It has been proved that confidence score prepared by recognizer is highly dependent on sub-word structure of each keyword. So comparing assigned scores to keyw...

متن کامل

BIC based on Modified Droop Control of Hybrid AC/DC Microgrid with PV/Wind/ESS under Variable Generation and Load Conditions

The idea of a microgrid is created by utilizing more diverse ac or dc distributed generation (DG) sources along with an energy storage system (ESS) and loads. The most efficient and reliable selection of ac and dc microgrids is a hybrid ac/dc microgrid. The hybrid microgrid largely overcomes the shortcomings of standalone ac or dc microgrids. A bidirectional interlinking converter (BIC) is util...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007